Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Shotgun Metagenomic Data Analysis ◾ 313

-r taxonomic_level \

-o kaiju_output/ERR1823608_table.tsv \

kaiju_output/ERR1823608.out \

-l taxonomic,levels,separated,by,commas

Run “kaiju2table” to learn about the usage and options of this command.

Most taxonomy classifiers of the metagenomic data follow the same steps: the database

downloading and classification. For almost all of them, these steps require large storage

space and memory that may not be available on the regular desktop computers. However,

if we do not have enough computational resources, we can use Centrifuge which requires

relatively small storage space and memory that fits personal computers.

Centrifuge classifier is available at “https://github.com/infphilo/centrifuge”. For the

updated installation instructions, visit that site. Up to this day, you can install it on Linux

using the following commands:

git clone https://github.com/infphilo/centrifuge

cd centrifuge

make

sudo make install prefix=/usr/local

If it has been installed successfully, no need to do anything else but to use it from any

directory. Run “centrifuge -h” to display the usage and options.

As usual, to use Centrifuge classifier, we will begin by building an index. There are

several ready-to-use indexes available at http://www.ccb.jhu.edu/software/centrifuge.

However, Centrifuge also needs sequence and taxonomy files and sequence ID. That can be

simplified by using “make” command that can build several standard and custom indices.

To do that, find the Centrifuge directory and change into “indices” directory and then run

the “make” command as follows:

cd indices

make p+h+v

# bacterial, human, and viral genomes [~12G]

make p_compressed # bacterial genomes compressed at the species

level [~4.2G]

make p_compressed+h+v

# combination of the two above [~8G]

This command will download the reference taxonomy files and reference genome at assem-

bly levels. The download may take a while depending on the speed of the Internet connec-

tion. It is also easier to download a database from Centrifuge homepage, which is available

at “https://ccb.jhu.edu/software/centrifuge/manual.shtml”. Centrifuge is used to assign

taxa to the short reads in the FASTQ files. For the “-x” option, make sure that you provide

the database name with the path if it is not in the current path.

mkdir centrifuge_out

centrifuge -x p+h+v \